Towards A Dependency-Based Gold Standard For German Parsers: The TIGER Dependency Bank

نویسندگان

  • Martin Forst
  • Nuria Bertomeu
  • Berthold Crysmann
  • Frederik Fouvry
  • Silvia Hansen-Schirra
  • Valia Kordoni
چکیده

In this paper we discuss the construction, features and intended uses of the TiGer DB. The TiGer DB is a dependency bank derived from the TiGer Treebank containing predicate-argument relations and several grammatical features which can be considered as semantically meaningful. It is produced semi-automatically by the conversion of the TiGer treebank into an LFG f-structure bank, which then in turn is converted into the TiGer DB. This allows for a relatively rapid construction. The grammatical relations and features encoded in the TiGer DB are chosen in order to keep the mapping from parser output, e.g. LFG f-structures or HPSG feature structures, to dependency triples simple. Hence, the TiGer DB can be used as a gold standard for the evaluation of German parsers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Representing Dependency Relations – Insights from Converting the German TiGerDB

Research in parser evaluation has led to the creation of dependency resources such as the TiGer Dependency Bank, a semi-automatic conversion of a subset of the TIGER Treebank. We explore the relationship between the TiGerDB representation and a more surface-oriented dependency analysis of German and describe how we mapped and recoded the TiGerDB into a format more closely linked to the original...

متن کامل

An Out-of-Domain Test Suite for Dependency Parsing of German

We present a dependency conversion of five German test sets from five different genres. The dependency representation is made as similar as possible to the dependency representation of TiGer, one of the two big syntactic treebanks of German. The purpose of these test sets is to enable researchers to test dependency parsing models on several different data sets from different text genres. We dis...

متن کامل

Treebank-Based Acquisition of Multilingual Unification Grammar Resources

Deep unification(constraint-)based grammars are usually hand-crafted. Scaling such grammars from fragments to unrestricted text is time-consuming and expensive. This problem can be exacerbated in multilingual broad-coverage grammar development scenarios. Cahill et al. (2002, 2004) and O’Donovan et al. (2004) present an automatic f-structure annotation-based methodology to acquire broad-coverage...

متن کامل

DCU 250 Arabic Dependency Bank: An LFG Gold Standard Resource for the Arabic Penn Treebank

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is neces...

متن کامل

Preliminary Experiments in Polish Dependency Parsing

Preliminary experiments presented in this paper consist in the induction and evaluation of a dependency parser for Polish. We train data-driven dependency models with publicly available parser-generation systems (MaltParser and MSTParser) given a converted dependency structure bank for Polish. Induced Polish dependency parsers are evaluated against a set of gold standard dependency structures u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004